Data collection methods:
Administrative and spatial, observational, interviews, surveys

PSCI 2270 - Week 7

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

October 10, 2023

Concepts vs. Indicators

  • Theories are made up of concepts (nodes):

    • Inequality, civil discourse, media consumption, political knowledge, outgroup contact, views on immigration
    • When creating a diagram of theory (DAGs) we took those for granted
  • Concepts are latent:

    • We almost never observe “concepts”
    • Instead we rely on “indicators” or “proxies”
  • Indicators are concrete:

    • Concrete measure of a latent concept
    • Sometimes they’re “good,” sometimes they’re “rough”

Sometimes there is slippage

  • Important to consider how do we construct indicators

    • Some more straightforward: what is your age? how often do you watch TV?
    • Others more complicated: political self-efficacy? racial discrimination?
    • Have to create an operational definition of a concept to make it into a variable in our dataset
  • Sometimes there is slippage between latent concept and proxy, e.g.

    • Responses to a specific policy question about affirmative action as a proxy for “racial resentment”
    • Outcomes measured via self-reports may be clouded by social desirability bias (e.g., self-reported voter turnout)
  • Important to make measurement as unobtrusive as possible

Measurement Error

  • Reliability:

    • Receive same answer over repeated measurements
    • Individual measurement = exact value + chance error
    • Chance errors tend to cancel out when we average across large sample
  • Validity:

    • Avoid systematic errors, bias, in the same direction
    • Individual measurement = exact value + chance error + bias

Data Collection Methods



  • Document analysis: Use of any audio, visual, or written materials as a source of data

  • Interview data: Data that are collected from responses to questions posed by the researcher to a respondent

  • Firsthand observation: Data that may be collected by making observations in a field study or in a laboratory setting

Everything is Possible

  • Question: Which factors affect protest activity?
  • Case study: Lohmann (1994)
  • Process-tracing: Pearlman (2013)
  • Laboratory experiment: Young (2019)
  • Social network analysis: Larson et al. (2019)
  • Using original surveys: Boulianne and Sangwon Lee (2022)
  • Field experiments: Bursztyn et al. (2021)
  • Original data on protests: Steinert-Threlkeld (2017)

Protests in space

Geolocation and spatial data


  • “Geographic Information Systems (GIS) in International Relations” by Jordan Branch

Geographic Information Systems (GIS) are being applied with increasing frequency, and with increasing sophistication, in international relations and in political science more generally. Their benefits have been impressive: analyses that simply would not have been possible without GIS are now being completed, and the spatial component of international politics—long considered central but rarely incorporated analytically— has been given new emphasis. However, new methods face new challenges, and to apply GIS successfully, two specific issues need to be addressed: measurement validity and selection bias. Both relate to the challenge of conceptualizing nonspatial phenomena with the spatial tools of GIS. Significant measurement error can occur when the concepts that are coded as spatial variables are not, in fact, validly measured by the default data structure of GIS, and selection bias can arise when GIS systematically excludes certain types of units. Because these potential problems are hidden by the technical details of the method, GIS data sets and analyses can sometimes appear to overcome these challenges when, in fact, they fail to do so. Once these issues come to light, however, potential solutions become apparent—including some in existing applications in international relations and in other fields.

What is GIS?



  • Geographic Information System (GIS): Any system for the collection and analysis of data that are coded spatially (by location). Generally involves the use of a GIS software package for the creation and analysis of spatial data.
  • Vector data: Points, lines, and polygons to describe spatial features: a point for a feature at a single location, a line for a linear feature such as a road, or a polygon for a feature that covers a definable spatial area.

  • Raster data: Pixels, predefined equivalent-sized units that are then assigned a value for a single variable across the entire area covered by the data.

Raster vs vector data



Peisakhin and Rozenas (2018) Korovkin and Makarin (2023)

Examples


  • Stasavage, David. 2011. States of Credit: Size, Power, and the Development of European Polities. Princeton, NJ: Princeton University Press.

    • Shape or scale of polity/country matters
  • Starr, Harvey. 2013. On Geopolitics: Space, Place, and International Relations. Boulder, CO: Paradigm.

    • Shared borders and interaction across them
  • Cederman, Lars-Erik, Kristian Skrede Gleditsch, and Halvard Buhaug. 2013. Inequality, Grievances, and Civil War. New York: Cambridge University Press.

    • Location and size of ethnic or rebel groups
  • What else?

Issues



  • Measurement validity:

    • Changes in meaning of boundaries over time
    • Changes in meaning of boundaries across space
  • Selection bias:

    • Exclusion of units based on inability to code them geographically

Discussion

What kind of geolocation or spatial data can we use to study determinants of protest activity?

Protests in text

Text as data


  • “Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges” by John Wilkerson and Andreu Casas

Text has always been an important data source in political science. What has changed in recent years is the feasibility of investigating large amounts of text quantitatively. The internet provides political scientists with more data than their mentors could have imagined, and the research community is providing accessible text analysis software packages, along with training and support. As a result, text-as-data research is becoming mainstream in political science. Scholars are tapping new data sources, they are employing more diverse methods, and they are becoming critical consumers of findings based on those methods. In this article, we first describe the four stages of a typical text-as-data project. We then review recent political science applications and explore one important methodological challenge—topic model instability—in greater detail.

Four stages of text analysis



  1. Obtaining text

    • web-scraping vs crowdsourcing
  1. From text to data

    • supervised vs unsupervised methods
  1. Quantitative analysis of text

    • analysis of counts vs predictions with machine learning
  1. Evaluating performance

    • gold standard vs out-of-sample validation

Four uses of text data



  • Classification: Unsupervised machine learning methods compare the similarity of documents based on co-occurring features

  • Scaling: Use texts to locate political actors on ideological space

  • Text Reuse: Explicitly value word sequencing in judging document similarity

  • Natural Language Processing: Moving from “whom?” to “who did what to whom?”

Issues



  • Measurement reliability:

    • Unsupervised Machine Learning produces different results
    • Also is an issue with Large Language Models (e.g. ChatGPT)
  • Measurement validity:

    • Unsupervised Machine Learning and Large Language Models are black boxes
  • Selection bias:

    • We often do not have access to the full universe of documents (e.g. API restrictions)

Discussion

What kind of text data can we use to study determinants of protest activity?

References


Boulianne, Shelley, and Sangwon Lee. 2022. “Conspiracy Beliefs, Misinformation, Social Media Platforms, and Protest Participation.” Media and Communication 10 (4). https://doi.org/10.17645/mac.v10i4.5667.
Bursztyn, Leonardo, Davide Cantoni, David Y Yang, Noam Yuchtman, and Y Jane Zhang. 2021. “Persistent Political Engagement: Social Interactions and the Dynamics of Protest Movements.” American Economic Review: Insights 3 (2): 233–50.
Korovkin, Vasily, and Alexey Makarin. 2023. “Conflict and Intergroup Trade: Evidence from the 2014 Russia-Ukraine Crisis.” American Economic Review 113 (1): 34–70. https://doi.org/10.1257/aer.20191701.
Larson, Jennifer M., Jonathan Nagler, Jonathan Ronen, and Joshua A. Tucker. 2019. “Social Networks and Protest Participation: Evidence from 130 Million Twitter Users.” American Journal of Political Science 63 (3): 690–705. https://doi.org/10.1111/ajps.12436.
Lohmann, Susanne. 1994. “The Dynamics of Informational Cascades: The Monday Demonstrations in Leipzig, East Germany, 198991.” World Politics 47 (1): 42–101. https://doi.org/10.2307/2950679.
Pearlman, Wendy. 2013. “Emotions and the Microfoundations of the Arab Uprisings.” Perspectives on Politics 11 (2): 387–409. https://doi.org/10.1017/s1537592713001072.
Peisakhin, Leonid, and Arturas Rozenas. 2018. “Electoral Effects of Biased Media: Russian Television in Ukraine.” American Journal of Political Science 62 (3): 535550.
Steinert-Threlkeld, Zachary C. 2017. “Spontaneous Collective Action: Peripheral Mobilization During the Arab Spring.” American Political Science Review 111 (2): 379–403. https://doi.org/10.1017/s0003055416000769.
Young, Lauren E. 2019. “The Psychology of State Repression: Fear and Dissent Decisions in Zimbabwe.” American Political Science Review 113 (1): 140–55. https://doi.org/10.1017/S000305541800076X.